feat: add NAESatisfiability → SetSplitting reduction (#841) [full pipeline v2] by zazabap · Pull Request #985 · CodingThrust/problem-reductions

zazabap · 2026-04-01T10:47:29Z

Summary

Implements NAESatisfiability → SetSplitting using the updated /verify-reduction → /add-reduction pipeline. Compare with PR #984 (old pipeline) to see the differences.

What's new vs PR #984

Aspect	PR #984 (old)	This PR (new)
Type gate	Not checked	`Or` → `Or` verified in Step 1
Test vectors JSON	Not exported	Exported, used for Rust test generation
Canonical example in example_db	Missing (#974 Check 9)	Included in `rule_builders.rs`
Example-db lookup test	Missing (#974 Check 10)	Included in `example_db.rs`
Verification artifact cleanup	Artifacts left in repo	Cleaned up — Python/Typst/PDF removed
Tests	5 tests	7 tests (added all-assignments round-trip, larger instance)

Commit structure

168f6be — verification artifacts (Typst proof + Python scripts from PR docs: verify reduction #841 — NAESatisfiability → SetSplitting #980)
17e9bca — Rust implementation + example_db + tests + artifact cleanup

The second commit shows the clean pattern: -6942 lines (artifacts removed) + 324 lines (Rust added).

Pipeline flow demonstrated

Issue #841 → /verify-reduction (type gate: Or→Or ✓)
           → Typst proof + constructor (32K checks) + adversary (46K checks)
           → test vectors JSON exported
           → /add-reduction (Rust from verified Python spec)
             → Step 1: reduce_to() + extract_solution()
             → Step 3: 7 tests from test vectors
             → Step 4: canonical example in rule_builders.rs
             → Step 4b: example-db lookup test
             → Step 7: verification artifacts cleaned up

Test plan

cargo test naesatisfiability_to_setsplitting — 7 tests pass
cargo clippy — clean
Canonical example registered and passing coverage gate
Verification artifacts removed from docs/paper/verify-reductions/

Closes #841

🤖 Generated with Claude Code

Typst: Construction (3 steps) + Correctness (⟹ + ⟸) + Extraction + Overhead + YES/NO examples Constructor: 32,513 checks, 0 failures (exhaustive n ≤ 5, 7 sections) Adversary: 46,774 checks, 0 failures (independent impl + hypothesis) Cross-comparison: 615 instances, all agree Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implemented via /verify-reduction -> /add-reduction pipeline. Includes: reduction rule, 7 tests, canonical example in example_db. Verification artifacts cleaned up (79K+ Python checks preceded this Rust impl). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-01T10:51:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.04%. Comparing base (423506c) to head (bd516ef).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff            @@
##             main     #985    +/-   ##
========================================
  Coverage   98.03%   98.04%            
========================================
  Files         784      786     +2     
  Lines       82310    82461   +151     
========================================
+ Hits        80695    80846   +151     
  Misses       1615     1615

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Root cause: PRs #985 and #991 failed CI because: 1. Local clippy doesn't use -D warnings but CI does (caught needless_range_loop) 2. New reductions can create paths that dominate existing direct reductions (test_find_dominated_rules_returns_known_set has hardcoded known set) Added to Step 6: - Mandatory `cargo clippy -- -D warnings` (not just `cargo clippy`) - Mandatory `cargo test` (full suite, not filtered) - Explicit dominated-rules gate with fix instructions Added to Common Mistakes: - clippy without -D warnings - dominated rules test - skipping full cargo test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolves clippy -D warnings failure on Rust 1.94 CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add design spec for proposed reductions Typst note Covers 9 reductions: 2 NP-hardness chain extensions (#973, #198), 4 Tier 1a blocked issues (#379, #380, #888, #822), and 3 Tier 1b blocked issues (#892, #894, #890). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add verify-reduction skill for mathematical verification of reductions New skill: /verify-reduction <issue-number> End-to-end pipeline that takes a reduction rule issue and produces: 1. Typst proof (Construction/Correctness/Extraction/Overhead + YES/NO examples) 2. Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5) 3. Lean 4 lemmas (non-trivial structural proofs required) Follows issue-to-pr conventions: creates worktree, works in isolation, submits PR. Strict quality gates (zero tolerance): - No "trivial" category — every reduction ≥5000 checks - 7 mandatory Python sections including NO (infeasible) example - Non-trivial Lean required (rfl/omega tautologies rejected) - Zero hand-waving in Typst ("clearly", "obviously" → rejected) - Mandatory gap analysis: every proof claim must have a test - Self-review checklist with 20+ items across 4 categories Developed and validated through PR #975 (800K+ checks, 3 bugs caught) and tested on issues #868 (caught wrong example) and #841 (35K checks). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: add YAML frontmatter + professional tone to verify-reduction skill - Added frontmatter (name, description) matching other skills' convention - Toned down aggressive language ("ZERO TOLERANCE", "THE HARSHEST STEP", "NON-NEGOTIABLE") to professional but firm language - All quality gates unchanged — same strictness, better presentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: adversarial multi-agent verification in verify-reduction skill Replaces Lean-required gates with adversarial second agent: - Step 5: Adversary agent independently implements reduce() and extract_solution() from theorem statement only (not constructor's script) - Step 5c: Cross-comparison of both implementations on 1000 instances - Lean downgraded from required to optional - hypothesis property-based testing for n up to 50 - Quality gates: 2 independent scripts ≥5000 checks each + cross-comparison Design rationale (docs/superpowers/specs/2026-04-01-adversarial-verification-design.md): - Same agent writing proof + test is the #1 risk for AI verification - Two independent implementations agreeing > one + trivial Lean - Lean caught 0 bugs in PR #975; Python caught all 4 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: design spec for verify-reduction skill enhancements Typst↔Python auto-matching, test vectors JSON for downstream consumption by add-rule and review-pipeline, adversary tailoring by reduction type, compositional verification via pred CLI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: implementation plan for verify-reduction enhancements 5 tasks: update verify-reduction (Step 4.5 auto-matching, Step 5 typed adversary, Step 8 downstream artifacts), create add-reduction skill, register in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: enhance verify-reduction with test vectors export, typed adversary, pipeline integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: create add-reduction skill — consumes verified artifacts from verify-reduction * feat: register add-reduction skill in CLAUDE.md, update verify-reduction description Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: three improvements to verify-reduction and add-reduction skills 1. verify-reduction Step 1: type compatibility gate — checks source/target Value types before proceeding. Stops and comments on issue if types are incompatible (e.g., optimization → decision needs K parameter). 2. add-reduction Step 7: mandatory cleanup of verification artifacts from docs/paper/verify-reductions/ — Python scripts, JSON, Typst, PDF must not get into the library. 3. add-reduction Steps 4/4b/5: mandatory requirements from #974 — canonical example in rule_builders.rs (Check 9), example-db lookup test (Check 10), paper reduction-rule entry (Check 11). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: concise verify-reduction (761→124 lines) + self-contained add-reduction verify-reduction: removed verbose templates, condensed checklists into prose, kept all requirements but removed boilerplate code blocks that the agent can derive from context. add-reduction: integrated add-rule Steps 1-6, write-rule-in-paper Steps 1-6, and #974 requirements (Checks 9/10/11) into a single self-contained skill. No need to read 3 other skills. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: restore structural requirements in verify-reduction (124→274 lines) The previous rewrite over-condensed the skill, removing gates that agents need to follow: 7-section descriptions with table, minimum check count table, check count audit template, gap analysis format, common mistakes table, and self-review checklist with checkboxes. Restored: all structural gates and requirements. Kept concise: no verbose Python/Typst code templates (agent derives these). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden add-reduction with file-level verification gates Steps 4, 4b, 5 now have HARD GATE labels with verification commands that check the SPECIFIC required files appear in `git diff --name-only`. Step 8 has a pre-commit gate that lists all 6 required files and blocks commit if any is missing. Root cause: subagents skipped Steps 4 (put example in rule file instead of rule_builders.rs) and 5 (skipped paper entry entirely) because the skill said "MANDATORY" but had no mechanical enforcement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add CI-equivalent checks to add-reduction pre-commit gate Root cause: PRs #985 and #991 failed CI because: 1. Local clippy doesn't use -D warnings but CI does (caught needless_range_loop) 2. New reductions can create paths that dominate existing direct reductions (test_find_dominated_rules_returns_known_set has hardcoded known set) Added to Step 6: - Mandatory `cargo clippy -- -D warnings` (not just `cargo clippy`) - Mandatory `cargo test` (full suite, not filtered) - Explicit dominated-rules gate with fix instructions Added to Common Mistakes: - clippy without -D warnings - dominated rules test - skipping full cargo test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: correct add-reduction HARD GATE for canonical examples rule_builders.rs is a 4-line pass-through — canonical examples are registered via canonical_rule_example_specs() in each rule file, wired through mod.rs. Updated Step 4 to match actual architecture. Also added analysis.rs to git add list (for dominated-rules updates). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: parent-side verification + pre-commit hook for add-reduction Two enforcement mechanisms that don't rely on subagent compliance: 1. Parent-side verification (Step 8a): After subagent reports DONE, the parent runs file gate checks independently. If any required file is missing, sends the subagent back — doesn't trust self-report. 2. Pre-commit hook (.claude/hooks/add-reduction-precommit.sh): Mechanically blocks commits of new rule files unless example_db.rs, reductions.typ, and mod.rs are also staged. Subagents cannot bypass. Root cause: subagents skip HARD GATE steps despite skill text saying "MANDATORY". Text-based enforcement doesn't work — need mechanical checks that run after the subagent, not instructions the subagent reads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: strengthen type compatibility gate in verify-reduction skill Expanded the type compatibility table to explicitly list all incompatible pairs (Min→Or, Max→Or, Max→Min, Min→Max, Or→Sum, etc.) with concrete examples from batch verification (#198 MVC→HamCircuit, #890 MaxCut→OLA). Added common mistake entry for proceeding past the type gate. Learned from batch run: 5 out of 50 reductions were mathematically verified but turned out to be unimplementable as ReduceTo due to type mismatches that the original gate didn't catch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Delete docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md * Delete docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md * Delete .claude/CLAUDE.md * Revert "Delete .claude/CLAUDE.md" This reverts commit 71c1444. * chore: remove docs/superpowers/specs/ directory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: integrate verify-reduction into add-rule pipeline - Delete /add-reduction skill and pre-commit hook (absorbed into /add-rule) - /add-rule now runs /verify-reduction by default; --no-verify to skip - /verify-reduction simplified: no worktree, no PR, no saved artifacts - /issue-to-pr passes --no-verify flag through to /add-rule - Update CLAUDE.md skill descriptions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Jinguo Liu <cacate0129@gmail.com>

zazabap and others added 2 commits April 1, 2026 06:38

fix: replace needless_range_loop with enumerate().take(4) in NAE→SS test

bd516ef

Resolves clippy -D warnings failure on Rust 1.94 CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

zazabap closed this Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add NAESatisfiability → SetSplitting reduction (#841) [full pipeline v2]#985

feat: add NAESatisfiability → SetSplitting reduction (#841) [full pipeline v2]#985
zazabap wants to merge 3 commits intomainfrom
feat/841-full-pipeline

zazabap commented Apr 1, 2026

Uh oh!

codecov bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zazabap commented Apr 1, 2026

Summary

What's new vs PR #984

Commit structure

Pipeline flow demonstrated

Test plan

Uh oh!

codecov bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Apr 1, 2026 •

edited

Loading